Overview

Dataset Statistics

Number of Variables 11
Number of Rows 1.3288e+06
Missing Cells 1.4267e+06
Missing Cells (%) 9.8%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 591.0 MB
Average Row Size in Memory 466.3 B
Variable Types
  • Numerical: 4
  • Categorical: 7

Dataset Insights

Administration end date has 1313295 (98.83%) missing values Missing
Dose administered has 37425 (2.82%) missing values Missing
Dose unit administered has 76027 (5.72%) missing values Missing
Unnamed: 0 is skewed Skewed
Internalpatientid is skewed Skewed
Dose administered is skewed Skewed
Administration date has a high cardinality: 1272518 distinct values High Cardinality
Administration end date has a high cardinality: 12308 distinct values High Cardinality
Administered medication atc 5 has a high cardinality: 977 distinct values High Cardinality
Dose form has a high cardinality: 132 distinct values High Cardinality
Dose unit administered has a high cardinality: 15467 distinct values High Cardinality
Administration date has constant length 21 Constant Length
Administration end date has constant length 21 Constant Length
  • 1
  • 2

Variables


Unnamed: 0

numerical

Approximate Distinct Count 1328800
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 21260800
Mean 8.3438e+07
Minimum 2451
Maximum 159704507
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Unnamed: 0 is skewed left (γ1 = -0.0807)

Quantile Statistics

Minimum 2451
5-th Percentile 5.8304e+06
Q1 5.5512e+07
Median 7.4312e+07
Q3 1.2054e+08
95-th Percentile 1.5196e+08
Maximum 159704507
Range 159702056
IQR 6.5031e+07

Descriptive Statistics

Mean 8.3438e+07
Standard Deviation 4.3794e+07
Variance 1.9179e+15
Sum 1.1087e+14
Skewness -0.0807
Kurtosis -0.9831
Coefficient of Variation 0.5249
  • Unnamed: 0 is not normally distributed (p-value 7.567973028759955e-14)

Internalpatientid

numerical

Approximate Distinct Count 626
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 21260800
Mean 85875.3525
Minimum 67
Maximum 168008
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Internalpatientid is skewed right (γ1 = 0.0492)

Quantile Statistics

Minimum 67
5-th Percentile 16588
Q1 39490
Median 87166
Q3 135461
95-th Percentile 156832
Maximum 168008
Range 167941
IQR 95971

Descriptive Statistics

Mean 85875.3525
Standard Deviation 51240.0983
Variance 2.6255e+09
Sum 1.1411e+11
Skewness 0.04922
Kurtosis -1.3416
Coefficient of Variation 0.5967
  • Internalpatientid is not normally distributed (p-value 9.458914325682099e-15)

Age at med administration

numerical

Approximate Distinct Count 1328286
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 21260800
Mean 68.9471
Minimum 29.3609
Maximum 104.1768
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Age at med administration is skewed right (γ1 = 0.2198)

Quantile Statistics

Minimum 29.3609
5-th Percentile 50.5163
Q1 59.6267
Median 67.5407
Q3 78.3168
95-th Percentile 89.9529
Maximum 104.1768
Range 74.8159
IQR 18.6902

Descriptive Statistics

Mean 68.9471
Standard Deviation 12.4639
Variance 155.3483
Sum 9.1617e+07
Skewness 0.2198
Kurtosis -0.6434
Coefficient of Variation 0.1808
  • Age at med administration is not normally distributed (p-value 0.0002462904460626387)
  • Age at med administration has 7 outliers

Administration date

categorical

Approximate Distinct Count 1272518
Approximate Unique (%) 95.8%
Missing 0
Missing (%) 0.0%
Memory Size 114276800

Length

Mean 21
Standard Deviation 0
Median 21
Minimum 21
Maximum 21

Sample

1st row 2003-06-21 08:54:1...
2nd row 2004-10-25 15:30:1...
3rd row 2007-02-27 10:05:3...
4th row 2007-08-27 07:29:4...
5th row 2007-08-27 07:44:5...

Letter

Count 0
Lowercase Letter 0
Space Separator 1328800
Uppercase Letter 0
Dash Punctuation 2657600
Decimal Number 19932000
  • Administration date has words of constant length

Administration end date

categorical

Approximate Distinct Count 12308
Approximate Unique (%) 79.4%
Missing 1313295
Missing (%) 98.8%
Memory Size 1333430

Length

Mean 21
Standard Deviation 0
Median 21
Minimum 21
Maximum 21

Sample

1st row 2014-12-19 23:28:0...
2nd row 2014-12-20 02:01:3...
3rd row 2014-12-24 16:47:0...
4th row 2014-12-25 02:59:1...
5th row 2014-12-26 18:24:2...

Letter

Count 0
Lowercase Letter 0
Space Separator 15505
Uppercase Letter 0
Dash Punctuation 31010
Decimal Number 232575
  • Administration end date has words of constant length

Administered medication atc 5

categorical

Approximate Distinct Count 977
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 106555119

Length

Mean 15.189
Standard Deviation 10.1704
Median 11
Minimum 4
Maximum 252

Sample

1st row venlafaxine
2nd row gemfibrozil
3rd row morphine
4th row omeprazole
5th row insulins and analo...

Letter

Count 18807774
Lowercase Letter 18733263
Space Separator 915896
Uppercase Letter 74511
Dash Punctuation 76651
Decimal Number 7711

Administration status

categorical

Approximate Distinct Count 10
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 93459587
  • The largest value (Given) is over 19.08 times larger than the second largest value (Held)

Length

Mean 5.3338
Standard Deviation 2.6209
Median 5
Minimum 4
Maximum 27

Sample

1st row Given
2nd row Given
3rd row Held
4th row Given
5th row Held

Letter

Count 7002944
Lowercase Letter 5674144
Space Separator 46005
Uppercase Letter 1328800
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Given, Held) take over 50.0%
  • The largest value (given) is over 19.09 times larger than the second largest value (held)

Dose form

categorical

Approximate Distinct Count 132
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 93921904
  • The largest value (tab) is over 4.24 times larger than the second largest value (inj)

Length

Mean 5.6817
Standard Deviation 3.0582
Median 8
Minimum 3
Maximum 30

Sample

1st row tab
2nd row tab,oral
3rd row inj
4th row cap,sa
5th row inj

Letter

Count 6901565
Lowercase Letter 6901565
Space Separator 7328
Uppercase Letter 0
Dash Punctuation 6634
Decimal Number 1686
  • The top 2 categories (tab, inj) take over 50.0%
  • The largest value (tab) is over 4.24 times larger than the second largest value (inj)

Dose administered

numerical

Approximate Distinct Count 24429
Approximate Unique (%) 1.9%
Missing 37425
Missing (%) 2.8%
Infinite 0
Infinite (%) 0.0%
Memory Size 20662000
Mean 18.7513
Minimum 0
Maximum 57036
Zeros 63871
Zeros (%) 4.8%
Negatives 0
Negatives (%) 0.0%
  • Dose administered is skewed right (γ1 = 72.7572)

Quantile Statistics

Minimum 0
5-th Percentile 0.4919
Q1 1
Median 1
Q3 1
95-th Percentile 3.356
Maximum 57036
Range 57036
IQR 0

Descriptive Statistics

Mean 18.7513
Standard Deviation 323.2971
Variance 104521.023
Sum 2.4215e+07
Skewness 72.7572
Kurtosis 6564.9947
Coefficient of Variation 17.2414
  • Dose administered is not normally distributed (p-value 4.226725107822298e-25)
  • Dose administered has 304856 outliers

Dose unit administered

categorical

Approximate Distinct Count 15467
Approximate Unique (%) 1.2%
Missing 76027
Missing (%) 5.7%
Memory Size 87800788
  • The largest value (tab) is over 6.95 times larger than the second largest value (cap,oral)

Length

Mean 5.0852
Standard Deviation 3.2256
Median 4
Minimum 1
Maximum 40

Sample

1st row tab
2nd row tab,oral
3rd row cap,sa
4th row tab
5th row 1drop

Letter

Count 5220008
Lowercase Letter 5093730
Space Separator 296697
Uppercase Letter 126278
Dash Punctuation 595
Decimal Number 563413
  • The largest value (tab) is over 5.99 times larger than the second largest value (1)

State

categorical

Approximate Distinct Count 49
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 98634191

Length

Mean 9.228
Standard Deviation 2.726
Median 14
Minimum 4
Maximum 20

Sample

1st row New Mexico
2nd row New Mexico
3rd row New Mexico
4th row New Mexico
5th row New Mexico

Letter

Count 11968684
Lowercase Letter 10349451
Space Separator 293507
Uppercase Letter 1619233
Dash Punctuation 0
Decimal Number 0

Interactions

Correlations

Missing Values